butions, their expectations and covariances add together: 



APPENDIX 
A. PROOFS 

A.l Proof of Statement 1 



Proof. Each coordinate N • s[ t of the vector in (4) is, 
by definition of partial supports, just the number of trans- 
actions in the randomized sequence T' that have intersec- 
tions with A of size V. Each randomized transaction t* con- 
tributes to one and only one coordinate N - s[i , namely to 
the one with V = O A), Since we are dealing with a 
per- transaction randomization, different randomized trans- 
actions contribute independently to one of the coordinates. 
Moreover, by item-invariance assumption, the probability 
that a given randomized transaction contributes to the co- 
ordinate number V depends only on the size of the original 
transaction t (which equals m) and the size I of intersection 
t O A. This probability equals P[l -+ V]. 

So, for all transactions in T that have intersections with A 
of the same size I (and there are N- si such transactions) the 
probabilities of contributing to various coordinates N-s[i are 
the same. We can split all N transactions into k + 1 groups 
according to their intersection size with A. Each group con- 
tributes to the vector in (4) as a multinomial distribution 
with probabilities 

(pp->o],pp-> 1] PP -►*]), 

independently from the other groups. Therefore the vector 
in (4) is a sum of A; + 1 independent multinomials. Now it 
is easy to compute both expectation and covariance. 

For a multinomial distribution (Xo, Xi , . . . , Xk) with pro- 
babilities (po,Pi, . • . ,Pk), where Xo + Xi + . . . + Xk = n, 
we have E Xi = n • p» and 

Cov(X it Xj) = E(Xi -pi){Xj -pj) = n • (pt^=> -piPj). 

In our case, Xi = Vs part of N * sj, n = N • si, and 
pi = p[l t]. For a sum of independent multinomial distri- 



k 

E{N-s' v ) = J^N-si-p[l->l'] 9 

1=0 

Cov(Ns'i,N>s' j ) = 

»^j/.f«.(Pli^tl-fci-pp->»|-pp^iD 

1=0 

Thus, after dividing by an appropriate power of -AT, the for- 
mulae in the statement are proven. □ 

A.2 Proof of Statement 2 



Proof. We are given a transaction t 6 T and an itemset 
ACT, such that |c| = m, |>1| = *, and #(tnA) = I. In 
the beginning of randomization, a number j is selected with 
distribution {pm^]}, and this is what the first summation 
takes care of. Now assume that we retain exactly j items 
of £, and discard m — j items. 

Suppose there are q items from 1 0 A among the retained 
items. How likely is this? Well, there are ("?) possible 
ways to choose j items from transaction t; and there are 
Of^J 1 ) possible ways to choose q items from t f\ A and 
j — q items from t \ A. Since all choices are equiproba- 
ble, we get ( J) / (7) as the probability that exactly q 
j4-items are retained. 

To make t' contain exactly /' items from A, we have 
to get additional V - q items from A \ c. We know that 
#(A\t) = k — l t and that any such item has probability p to 
get into c'. The last terms in (8) immediately follow. Sum- 
mation bounds restrict q to its actually possible (= nonzero 
probability) values. □ 

A-3 Proof of Statement 3 



Proof. Let us denote 

pi := (P [I -r 0] ,P[/ -¥ 1] , . . . ,Pp -> *]) T , 
$ := <- 0] Ml <- 1] . — .« P <- fc l) T - 

Since PQ = QP = I (where J is the identity matrix), we 
have 

£ p[i -> *1 9p <-i] = £ p [*'-♦ l 1 q \> *- J 1 - *=>• 

(=0 < f =o 
Notice also, from (7), that matrix D[l] can be written as 

JDp] = diag(pi) ~ pi pi T , 
where diag(pi) denotes the diagonal matrix with pi -coord- 



inates as its diagonal elements. Now it is easy to see that 

k 



mula. We have; 



t'=o 



1=0 

1 * 

= T7 X) st 9fc T (diag(pi) - pi pi T ) git = 
*=o 

1 fc 

1=0 



i=o i'=o 



(Var s)„ t = 

4E« T? ')(E"N'l«[t«-'f-M = 

1=0 l'=0 

= tfE';(E«P^*P^']*[*<-'T- 

i=o t,t'=o 

is:0 >=0 i'=0 

= ^ S *5(*C* -«C* ♦-!!)- 



j=0 



□ 



A.4 Proof of Statement 4 

Proof. We prove the left formula in (13) first, and then 
show that the right one follows from the left one. Consider 
N - £i; it equals 

Ar.E, = W.£ su PP T (<7) = Y, #{tiZT\CCU} = 

CCA, |C| = I CQA, |C| = I 

= X)#{CC^| |C| =J,C7Ct j }. 

t=l 

In other words, each transaction U should be counted as 
many times as many different /-sized subsets C C A it 
contains. From simple combinatorics we know that if j = 
#(A O tt) and j ^ I, then t« contains (^) different i-sized 
subsets of A Therefore, 

= E ({) e T i #( Am <) -a- E 

and the left formula is proven. Now we can check the right 
formula just by replacing the E/s according to the left for- 



^' ), - , (')(')-§4 ( - 1,i "'(')(; 

ri "J' 1 O' + OKi-i'-O' 



since the sum 



is zero whenever q — l > 0. 



To prove that matrix P becomes lower triangular after 
the transformation from s and $ 1 to E and E let us find 
how E 2' depends on 2 using the definition (12). 

EE;,= £ Esupp T '(C) = 

CC.A, |C| = I' 

= E ^pp?"(c) = 

CC.A, |C| = I' 1=0 

= E E^P-^nE(-ir , fj)s>(c f . r ) = 

CCA |ci=i« 1=0 j=l \ / 

=EE(-ir'ffV"^^ E s y (c,T) = 

j=o 1=0 \ / cca |cj = V 

Cl'j 

= f>> E E «»pp T w = 

J=0 CCA |C| = l' SCC,|B| = j 

«' 

= £ c <'> £ #{C7|SC(7CA,|q = 0^u PP T (B) = 

>=0 BCA |B|=j 

>=0 BCA,\B\=j\ J l i=0 \ V 

Now it is clear that only the lower triangle of the matrix can 
have non-zeros. □ 



